<!DOCTYPE html>
<!-- saved from url=(0014)about:internet -->
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<meta http-equiv="x-ua-compatible" content="IE=9" >

<title>Chapter 6: Exercise 7</title>

<style type="text/css">
body, td {
   font-family: sans-serif;
   background-color: white;
   font-size: 12px;
   margin: 8px;
}

tt, code, pre {
   font-family: 'DejaVu Sans Mono', 'Droid Sans Mono', 'Lucida Console', Consolas, Monaco, monospace;
}

h1 { 
   font-size:2.2em; 
}

h2 { 
   font-size:1.8em; 
}

h3 { 
   font-size:1.4em; 
}

h4 { 
   font-size:1.0em; 
}

h5 { 
   font-size:0.9em; 
}

h6 { 
   font-size:0.8em; 
}

a:visited {
   color: rgb(50%, 0%, 50%);
}

pre {	
   margin-top: 0;
   max-width: 95%;
   border: 1px solid #ccc;
   white-space: pre-wrap;
}

pre code {
   display: block; padding: 0.5em;
}

code.r, code.cpp {
   background-color: #F8F8F8;
}

table, td, th {
  border: none;
}

blockquote {
   color:#666666;
   margin:0;
   padding-left: 1em;
   border-left: 0.5em #EEE solid;
}

hr {
   height: 0px;
   border-bottom: none;
   border-top-width: thin;
   border-top-style: dotted;
   border-top-color: #999999;
}

@media print {
   * { 
      background: transparent !important; 
      color: black !important; 
      filter:none !important; 
      -ms-filter: none !important; 
   }

   body { 
      font-size:12pt; 
      max-width:100%; 
   }
       
   a, a:visited { 
      text-decoration: underline; 
   }

   hr { 
      visibility: hidden;
      page-break-before: always;
   }

   pre, blockquote { 
      padding-right: 1em; 
      page-break-inside: avoid; 
   }

   tr, img { 
      page-break-inside: avoid; 
   }

   img { 
      max-width: 100% !important; 
   }

   @page :left { 
      margin: 15mm 20mm 15mm 10mm; 
   }
     
   @page :right { 
      margin: 15mm 10mm 15mm 20mm; 
   }

   p, h2, h3 { 
      orphans: 3; widows: 3; 
   }

   h2, h3 { 
      page-break-after: avoid; 
   }
}

</style>



<!-- MathJax scripts -->
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>



</head>

<body>
<h1>Chapter 6: Exercise 7</h1>

<h3>a</h3>

<p>The likelihood for the data is:</p>

<p>\[ 
\begin{aligned}
    L(\theta \mid \beta)
    &= p(\beta \mid \theta)
    \\
    &= p(\beta_1 \mid \theta)
    \times \cdots
    \times p(\beta_n \mid \theta)
    \\
    &= \prod_{i = 1}^{n}
    p(\beta_i \mid \theta)
    \\
    &= \prod_{i = 1}^{n}
    \frac{
        1
    }{
        \sigma \sqrt{2\pi}
    }
    \exp
    \left(-
        \frac{
            Y_i - (\beta_0 + \sum_{j = 1}^{p} \beta_j X_{ij})
        }{
            2\sigma^2
        }
    \right)
    \\
    &=
    \left(
        \frac{
            1
        }{
            \sigma \sqrt{2\pi}
        }
    \right)^n
    \exp
    \left(
        - \frac{
            1
        }{
            2\sigma^2
        }
        \sum_{i = 1}^{n}
        \left[
            Y_i - (\beta_0 + \sum_{j = 1}^{p} \beta_j X_{ij})
        \right]^2
    \right)
\end{aligned}
 \]</p>

<h3>b</h3>

<p>The posterior with double exponential (Laplace Distribution) with mean 0 and
common scale parameter \( b \), i.e. \( p(\beta) = \frac{1}{2b}\exp(- \lvert \beta
\rvert / b) \) is:</p>

<p>\[ 
    f(\beta \mid X, Y)
    \propto f(Y \mid X, \beta) p(\beta \mid X)
    = f(Y \mid X, \beta) p(\beta)
 \]</p>

<p>Substituting our values from (a) and our density function gives us:</p>

<p>\[ 
\begin{aligned}
    f(Y \mid X, \beta)p(\beta)
    &=
    \left(
        \frac{
            1
        }{
            \sigma \sqrt{2\pi}
        }
    \right)^n
    \exp
    \left(
        - \frac{
            1
        }{
            2\sigma^2
        }
        \sum_{i = 1}^{n}
        \left[
            Y_i - (\beta_0 + \sum_{j = 1}^{p} \beta_j X_{ij})
        \right]^2
    \right)
    \left(
        \frac{
            1
        }{
            2b
        }
        \exp(- \lvert \beta \rvert / b)
    \right)
    \\
    &=
    \left(
        \frac{
            1
        }{
            \sigma \sqrt{2\pi}
        }
    \right)^n
    \left(
        \frac{
            1
        }{
            2b
        }
    \right)
    \exp
    \left(
        - \frac{
            1
        }{
            2\sigma^2
        }
        \sum_{i = 1}^{n}
        \left[
            Y_i - (\beta_0 + \sum_{j = 1}^{p} \beta_j X_{ij})
        \right]^2
        -
        \frac{
            \lvert \beta \rvert
        }{
            b
            }
    \right)
\end{aligned}
 \]</p>

<h3>c</h3>

<p>Showing that the Lasso estimate for \( \beta \) is the mode under this posterior
distribution is the same thing as showing that the most likely value for
\( \beta \) is given by the lasso solution with a certain \( \lambda \).</p>

<p>We can do this by taking our likelihood and posterior and showing that it can
be reduced to the canonical Lasso Equation 6.7 from the book.</p>

<p>Let&#39;s start by simplifying it by taking the logarithm of both sides:</p>

<p>\[ 
\begin{aligned}
    \log
    f(Y \mid X, \beta)p(\beta)
    &=
    \log
    \left[
        \left(
            \frac{
                1
            }{
                \sigma \sqrt{2\pi}
            }
        \right)^n
        \left(
            \frac{
                1
            }{
                2b
            }
        \right)
        \exp
        \left(
            - \frac{
                1
            }{
                2\sigma^2
            }
            \sum_{i = 1}^{n}
            \left[
                Y_i - (\beta_0 + \sum_{j = 1}^{p} \beta_j X_{ij})
            \right]^2
            -
            \frac{
                \lvert \beta \rvert
            }{
                b
                }
        \right)
    \right]
    \\
    &=
    \log
    \left[
        \left(
            \frac{
                1
            }{
                \sigma \sqrt{2\pi}
            }
        \right)^n
        \left(
            \frac{
                1
            }{
                2b
            }
        \right)
    \right]
    -
    \left(
        \frac{
            1
        }{
            2\sigma^2
        }
        \sum_{i = 1}^{n}
        \left[
            Y_i - (\beta_0 + \sum_{j = 1}^{p} \beta_j X_{ij})
        \right]^2
        +
        \frac{
            \lvert \beta \rvert
        }{
            b
        }
    \right)
\end{aligned}
 \]</p>

<p>We want to maximize the posterior, this means:
\[ 
\begin{aligned}
    \arg\max_\beta \, f(\beta \mid X, Y)
    &=
    \arg\max_\beta
    \,
    \log
    \left[
        \left(
            \frac{
                1
            }{
                \sigma \sqrt{2\pi}
            }
        \right)^n
        \left(
            \frac{
                1
            }{
                2b
            }
        \right)
    \right]
    -
    \left(
        \frac{
            1
        }{
            2\sigma^2
        }
        \sum_{i = 1}^{n}
        \left[
            Y_i - (\beta_0 + \sum_{j = 1}^{p} \beta_j X_{ij})
        \right]^2
        +
        \frac{
            \lvert \beta \rvert
        }{
            b
            }
    \right)
    \\
\end{aligned}
 \]</p>

<p>Since we are taking the difference of two values, the maximum of this value is
the equivalent to taking the difference of the second value in terms of
\( \beta \). This results in:</p>

<p>\[ 
\begin{aligned}
    &=
    \arg\min_\beta
    \,
    \frac{
        1
    }{
        2\sigma^2
    }
    \sum_{i = 1}^{n}
    \left[
        Y_i - (\beta_0 + \sum_{j = 1}^{p} \beta_j X_{ij})
    \right]^2
    +
    \frac{
        \lvert \beta \rvert
    }{
        b
    }
    \\
    &=
    \arg\min_\beta
    \,
    \frac{
        1
    }{
        2\sigma^2
    }
    \sum_{i = 1}^{n}
    \left[
        Y_i - (\beta_0 + \sum_{j = 1}^{p} \beta_j X_{ij})
    \right]^2
    +
    \frac{
        1
    }{
        b
    }
    \sum_{j = 1}^{p} \lvert \beta_j \rvert
    \\
    &=
    \arg\min_\beta
    \,
    \frac{
        1
    }{
        2\sigma^2
    }
    \left(
        \sum_{i = 1}^{n}
        \left[
            Y_i - (\beta_0 + \sum_{j = 1}^{p} \beta_j X_{ij})
        \right]^2
        +
        \frac{
            2\sigma^2
        }{
            b
        }
        \sum_{j = 1}^{p} \lvert \beta_j \rvert
    \right)
\end{aligned}
 \]</p>

<p>By letting \( \lambda = 2\sigma^2/b \), we can see that we end up with:</p>

<p>\[ 
\begin{aligned}
    &=
    \arg\min_\beta
    \,
    \sum_{i = 1}^{n}
    \left[
        Y_i - (\beta_0 + \sum_{j = 1}^{p} \beta_j X_{ij})
    \right]^2
    +
    \lambda
    \sum_{j = 1}^{p} \lvert \beta_j \rvert
    \\
    &=
    \arg\min_\beta
    \,
    \text{RSS}
    +
    \lambda
    \sum_{j = 1}^{p} \lvert \beta_j \rvert
\end{aligned}
 \]</p>

<p>which we know is the Lasso from Equation 6.7 in the book. Thus we know that
when the posterior comes from a Laplace distribution with mean zero and common
scale parameter \( b \), the mode for \( \beta \) is given by the Lasso solution when
\( \lambda = 2\sigma^2 / b \).</p>

<h3>d</h3>

<p>The posterior distributed according to Normal distribution with mean 0 and
variance \( c \) is:</p>

<p>\[ 
\begin{aligned}
    f(\beta \mid X, Y)
    \propto f(Y \mid X, \beta) p(\beta \mid X)
    = f(Y \mid X, \beta) p(\beta)
\end{aligned}
 \]</p>

<p>Our probability distribution function then becomes:
\[ 
        p(\beta)
        = \prod_{i = 1}^{p} p(\beta_i)
        = \prod_{i = 1}^{p}
        \frac{
            1
        }{
            \sqrt{
                2c\pi
            }
        }
        \exp \left(
            - \frac{
                \beta_i^2
                }{
                    2c
                }
        \right)
        = \left(
            \frac{
                1
            }{
                \sqrt{
                    2c\pi
                }
            }
        \right)^p
        \exp \left(
            - \frac{
                1
            }{
                2c
            }
            \sum_{i = 1}^{p} \beta_i^2
        \right)
 \]</p>

<p>Substituting our values from (a) and our density function gives us:</p>

<p>\[ 
\begin{aligned}
    f(Y \mid X, \beta)p(\beta)
    &=
    \left(
        \frac{
            1
        }{
            \sigma \sqrt{2\pi}
        }
    \right)^n
    \exp
    \left(
        - \frac{
            1
        }{
            2\sigma^2
        }
        \sum_{i = 1}^{n}
        \left[
            Y_i - (\beta_0 + \sum_{j = 1}^{p} \beta_j X_{ij})
        \right]^2
    \right)
    \left(
        \frac{
            1
        }{
            \sqrt{
                2c\pi
            }
        }
    \right)^p
    \exp \left(
        - \frac{
            1
        }{
            2c
        }
        \sum_{i = 1}^{p} \beta_i^2
    \right)
    \\
    &=
    \left(
        \frac{
            1
        }{
            \sigma \sqrt{2\pi}
        }
    \right)^n
    \left(
        \frac{
            1
        }{
            \sqrt{
                2c\pi
            }
        }
    \right)^p
    \exp
    \left(
        - \frac{
            1
        }{
            2\sigma^2
        }
        \sum_{i = 1}^{n}
        \left[
            Y_i - (\beta_0 + \sum_{j = 1}^{p} \beta_j X_{ij})
        \right]^2
        - \frac{
            1
        }{
            2c
        }
        \sum_{i = 1}^{p} \beta_i^2
    \right)
\end{aligned}
 \]</p>

<h3>e</h3>

<p>Like from part c, showing that the Ridge Regression estimate for \( \beta \) is
the mode and mean under this posterior distribution is the same thing as
showing that the most likely value for \( \beta \) is given by the lasso solution
with a certain \( \lambda \).</p>

<p>We can do this by taking our likelihood and posterior and showing that it can
be reduced to the canonical Ridge Regression Equation 6.5 from the book.</p>

<p>Let&#39;s start by simplifying it by taking the logarithm of both sides:</p>

<p>Once again, we can take the logarithm of both sides to simplify it:</p>

<p>\[ 
\begin{aligned}
    \log f(Y \mid X, \beta)p(\beta)
    &=
    \left(
        \frac{
            1
        }{
            \sigma \sqrt{2\pi}
        }
    \right)^n
    \left(
        \frac{
            1
        }{
            \sqrt{
                2c\pi
            }
        }
    \right)^p
    \exp
    \left(
        - \frac{
            1
        }{
            2\sigma^2
        }
        \sum_{i = 1}^{n}
        \left[
            Y_i - (\beta_0 + \sum_{j = 1}^{p} \beta_j X_{ij})
        \right]^2
        - \frac{
            1
        }{
            2c
        }
        \sum_{i = 1}^{p} \beta_i^2
    \right)
    \\
    &=
    \log
    \left[
        \left(
            \frac{
                1
            }{
                \sigma \sqrt{2\pi}
            }
        \right)^n
        \left(
            \frac{
                1
            }{
                \sqrt{
                    2c\pi
                }
            }
        \right)^p
    \right]
    -
    \left(
        \frac{
            1
        }{
            2\sigma^2
        }
        \sum_{i = 1}^{n}
        \left[
            Y_i - (\beta_0 + \sum_{j = 1}^{p} \beta_j X_{ij})
        \right]^2
        +
        \frac{
            1
        }{
            2c
        }
        \sum_{i = 1}^{p} \beta_i^2
    \right)
\end{aligned}
 \]</p>

<p>We want to maximize the posterior, this means:
\[ 
\begin{aligned}
    \arg\max_\beta \, f(\beta \mid X, Y)
    &=
    \arg\max_\beta
    \,
    \log
    \left[
        \left(
            \frac{
                1
            }{
                \sigma \sqrt{2\pi}
            }
        \right)^n
        \left(
            \frac{
                1
            }{
                \sqrt{
                    2c\pi
                }
            }
        \right)^p
    \right]
    -
    \left(
        \frac{
            1
        }{
            2\sigma^2
        }
        \sum_{i = 1}^{n}
        \left[
            Y_i - (\beta_0 + \sum_{j = 1}^{p} \beta_j X_{ij})
        \right]^2
        +
        \frac{
            1
        }{
            2c
        }
        \sum_{i = 1}^{p} \beta_i^2
    \right)
\end{aligned}
 \]</p>

<p>Since we are taking the difference of two values, the maximum of this value is
the equivalent to taking the difference of the second value in terms of
\( \beta \). This results in:</p>

<p>\[ 
\begin{aligned}
    &=
    \arg\min_\beta
    \,
    \left(
        \frac{
            1
        }{
            2\sigma^2
        }
        \sum_{i = 1}^{n}
        \left[
            Y_i - (\beta_0 + \sum_{j = 1}^{p} \beta_j X_{ij})
        \right]^2
        +
        \frac{
            1
        }{
            2c
        }
        \sum_{i = 1}^{p} \beta_i^2
    \right)
    \\
    &=
    \arg\min_\beta
    \,
    \left(
        \frac{
            1
        }{
            2\sigma^2
        }
    \right)
    \left(
        \sum_{i = 1}^{n}
        \left[
            Y_i - (\beta_0 + \sum_{j = 1}^{p} \beta_j X_{ij})
        \right]^2
        +
        \frac{
            \sigma^2
        }{
            c
        }
        \sum_{i = 1}^{p} \beta_i^2
    \right)
\end{aligned}
 \]</p>

<p>By letting \( \lambda = \sigma^2/ c \), we end up with:</p>

<p>\[ 
\begin{aligned}
    &=
    \arg\min_\beta
    \,
    \left(
        \frac{
            1
        }{
            2\sigma^2
        }
    \right)
    \left(
        \sum_{i = 1}^{n}
        \left[
            Y_i - (\beta_0 + \sum_{j = 1}^{p} \beta_j X_{ij})
        \right]^2
        +
        \lambda
        \sum_{i = 1}^{p} \beta_i^2
    \right)
    \\
    &=
    \arg\min_\beta
    \,
    \text{RSS}
    +
    \lambda
    \sum_{i = 1}^{p} \beta_i^2
\end{aligned}
 \]</p>

<p>which we know is the Ridge Regression from Equation 6.5 in the book. Thus we
know that when the posterior comes from a normal distribution with mean zero
and variance \( c \), the mode for \( \beta \) is given by the Ridge Regression
solution when \( \lambda = \sigma^2 / c \). Since the posterior is Gaussian, we
also know that it is the posterior mean.</p>

</body>

</html>

